Skip to content

Conversation

@bilby91
Copy link

@bilby91 bilby91 commented Apr 12, 2023

PR Summary

This pull requests adds support for AF_UNIX sockets.

The implementation modifies contrib/win32/win32compat/socketio.c socketio_acceptEx implementation so that it can handle sa_family == AF_UNIX. contrib/win32/win32compat/w32fd.c has also been modified in order to remove the path that uses fileio_afunix_socket.

The unix socket name is conditionally using C:\\tmp\\ssh-XXXXXXXXXX instead of /tmp/ssh-XXXXXXXXXX since it doesn't work correctly when using ssh in the windows host. I'm not sure if /tmp is the best directory to host the sock in Windows but I left it that way since unix users will expect the agent socks to live there. The caveat here is that you need to create the /tmp folder in Windows in order to properly work.

One known gap that the implementation is missing is conditionally using AF_UNIX. I know that a subset of Windows version support it but I'm not sure what is the best way to handle this. If I can get any guidance in terms of how to approach it I can test it and modify it.

PR Context

The motivation around working on this patch was to get support for SSH Agent Forwarding. With this patch applied, ssh-add -l will list the keys from the local system. I've also tested using git to clone repositories and it's successfully able to use the local system private keys through the agent.

I've also tested that WSL can successfully use the SSH Agent sock as long as the SSH_AUTH_SOCK is defined in the bash session.

This pull request could potentially solve the following issues:

PowerShell/Win32-OpenSSH#1461
PowerShell/Win32-OpenSSH#1761
PowerShell/Win32-OpenSSH#1462

@bilby91 bilby91 force-pushed the add-af-unix-support branch from 57dabcc to 3a77587 Compare April 12, 2023 22:03
@bilby91
Copy link
Author

bilby91 commented Apr 13, 2023

@tgauth Would it be fine if we guard the different code paths with #ifdef HAVE_AFUNIX_H ?

@tgauth
Copy link
Collaborator

tgauth commented Apr 13, 2023

@tgauth Would it be fine if we guard the different code paths with #ifdef HAVE_AFUNIX_H ?

I don't see why not. If you haven't seen it yet, can add it to: https://github.com/PowerShell/openssh-portable/blob/latestw_all/contrib/win32/openssh/config.h.vs

Thanks for working on this!

@bilby91
Copy link
Author

bilby91 commented Apr 13, 2023

@tgauth Thanks for the quick feedback! I added the HAVE_AFUNIX_H guard. Visual Studio doesn't seem to be picking up the #define statement from config.h.vs, is there anything else I need to run in order to make the change effective when compiling ? Want to make sure stuff is working properly with my tests.

@tgauth
Copy link
Collaborator

tgauth commented Apr 13, 2023

@tgauth Thanks for the quick feedback! I added the HAVE_AFUNIX_H guard. Visual Studio doesn't seem to be picking up the #define statement from config.h.vs, is there anything else I need to run in order to make the change effective when compiling ? Want to make sure stuff is working properly with my tests.

There's a config.h file in the root folder that gets created from config.h.vs during compilation of the config.vcxproj - can you confirm that's picking up the #define statement?

@bilby91
Copy link
Author

bilby91 commented Apr 13, 2023

@tgauth Yes, I can confirm that config.h in the root got updated.

/* Use PIPES instead of a socketpair() */
#define USE_PIPES 1

/* define 1 if afunix.h is available */
#define HAVE_AFUNIX_H 1

Still, it doesn't seem to be compiling with HAVE_AFUNIX_H 1.

@tgauth
Copy link
Collaborator

tgauth commented Apr 13, 2023

@tgauth Yes, I can confirm that config.h in the root got updated.

/* Use PIPES instead of a socketpair() */
#define USE_PIPES 1

/* define 1 if afunix.h is available */
#define HAVE_AFUNIX_H 1

Still, it doesn't seem to be compiling with HAVE_AFUNIX_H 1.

oh, my bad - seems like config.h needs to be explicitly included in posix_compat where HAVE_AFUNIX_H is used. Try putting #include "..\..\..\config.h" in both socketio.c and w32fd.c, or try defining it in w32fd.h since that's imported by both?

@bilby91
Copy link
Author

bilby91 commented Apr 13, 2023

@tgauth That did the trick, thanks! I added the include statement in both files because doing so in w32fd.h had issues with signal.c.

I'm testing now a different approach for temp folder where the unix socket will live. I'm changing the implementation so that it uses fileapi.h GetTempPath to replace the /tmp. Do you think this is a better location ?

@tgauth
Copy link
Collaborator

tgauth commented Apr 13, 2023

I'm testing now a different approach for temp folder where the unix socket will live. I'm changing the implementation so that it uses fileapi.h GetTempPath to replace the /tmp. Do you think this is a better location ?

yep, that makes sense to me!

@bilby91
Copy link
Author

bilby91 commented Apr 13, 2023

@tgauth I think I have addressed the different issues now. Take a look at the code that generates the temporal directory.

Last commit was pushed using this feature 😎

@bilby91
Copy link
Author

bilby91 commented Apr 13, 2023

@tgauth Was looking at the failing tests. They generally timeout ? Some tests seem to have failed but they look unrelated (at first glance). I wonder if having #define HAVE_AFUNIX_H 1 by default is causing issues in the ADO environment.

Any insight ?

@bilby91
Copy link
Author

bilby91 commented Apr 14, 2023

I think I know what is going on. Will try to fix it over the weekend.

@bilby91 bilby91 force-pushed the add-af-unix-support branch from dafedf8 to 8ca026c Compare April 14, 2023 22:59
@bilby91
Copy link
Author

bilby91 commented Apr 15, 2023

@tgauth Tests are passing now :)

I had to change a few other calls to socket() due to the ssh-agent's named pipes implementation. I don't love how it's done but the path for the named pipe is fixed so comparing should be safe to decide (based on my understanding and local testing).

I think that a subsequent change could replace the named pipes usage in agent.c and use a native AF_UNIX socket to simplify the implementation.

I'm moved AGENT_PIPE_ID to config.h because I was planning to use it with strcmp but later realized that it's a wchar_t instead of a char * so I would need to convert it before comparing. I can do the transformation or compare with the hardcoded version. What do you think ?

Anything else I might be missing ?

Thanks so much looking at the changes!

@tgauth
Copy link
Collaborator

tgauth commented Apr 17, 2023

I know that a subset of Windows version support it but I'm not sure what is the best way to handle this. If I can get any guidance in terms of how to approach it I can test it and modify it.

Based on https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/ support started with Windows 10. We have a similar check already defined that I think can also be utilized here: https://github.com/PowerShell/openssh-portable/blob/latestw_all/contrib/win32/win32compat/misc_internal.h#L15

Anything else I might be missing ?

Ideally, would like to have some test coverage for agent forwarding! There are some upstream agent tests that the CI currently skips - https://github.com/PowerShell/openssh-portable/blob/latestw_all/regress/agent.sh, https://github.com/PowerShell/openssh-portable/blob/latestw_all/contrib/win32/openssh/bash_tests_iterator.ps1#L189. If the tests applicable to agent forwarding could be modified as necessary for Windows and not skipped, that would be great! A single bash test can be run with the following command:
.\bash_tests_iterator.ps1 -OpenSSHBinPath "C:\openssh-portable\bin\x64\Release\" -BashTestsPath "c:\openssh-portable\regress" -shellpath "c:\cygwin64\bin\sh.exe" -TestFilePath "C:\openssh-portable\regress\agent.sh"

@tgauth
Copy link
Collaborator

tgauth commented Apr 17, 2023

I'm moved AGENT_PIPE_ID to config.h because I was planning to use it with strcmp but later realized that it's a wchar_t instead of a char * so I would need to convert it before comparing. I can do the transformation or compare with the hardcoded version. What do you think ?

For maintainability, I'm in favor of doing the transformation so that AGENT_PIPE_ID is the singular definition of the pipe name, as opposed to hardcoding it in a second place.

@bilby91
Copy link
Author

bilby91 commented Apr 17, 2023

@tgauth Thanks for the feedback!

I'll take a look at the tests and doing the transformation for AGENT_PIPE_ID!

In regards to using IsWindows8OrGreater/IsWin7OrLess, you are thinking of replacing the HAVE_AFUNIX_H with calls to those functions at runtime ? Not sure how Microsoft build process works but HAVE_AFUNIX_H could be defined only for Win7 >.

Just want to make sure I fully understand how we want to protect the paths.

@bilby91 bilby91 force-pushed the add-af-unix-support branch from 795b52e to fd096f5 Compare April 19, 2023 12:34
@bilby91
Copy link
Author

bilby91 commented Apr 19, 2023

@tgauth I was able to include a fraction of the agent.sh tests! Had to fix a bunch of stuff in the test-exec.sh, which was probably preventing them from running originally.

I had to continue excluding some of the tests for two reasons:

  1. Some tests try to use different agents but Windows only supports one of them. I think this limitation can go away if we implement the agent with AF_UNIX and follow the same strategy as OpenSSH-portable implementation.
  2. I wasn't able to get certificate-based key authentication to work. I saw some other tests skipped as well so I'm assuming this is currently not working.

Having said that, with these changes, we have a little bit more coverage while keeping all the other tests passing.

I also made the change to compare with AGENT_PIPE_ID. I think the last piece remaining would be to decide what to do with HAVE_AFUNIX_H and IsWindows8OrGreater. I think we could keep it as it is with HAVE_AFUNIX_H and decide at build time if the feature needs to be included or not, or, I can change the places where I used HAVE_AFUNIX_H and use IsWindows8OrGreater instead. Let me know what you think!

@tgauth
Copy link
Collaborator

tgauth commented May 2, 2023

Is there anything on my end that you think I can help with in the meantime?

Not that I can think of right now, but will keep you posted - thanks for your patience!

@bilby91
Copy link
Author

bilby91 commented Sep 30, 2023

@tgauth was wondering if you had any updates about this topic!

Thanks!

@maertendMSFT
Copy link

This is delayed due to focusing on more simple PRs into the 9.4 release as well as an in depth security review of the behavior change. Because this is a new feature, not a bug fix, it requires further scrutiny.

@bilby91
Copy link
Author

bilby91 commented Oct 2, 2023

@maertendMSFT I understand. Thanks for the feedback!

@arixmkii
Copy link

arixmkii commented Dec 5, 2023

@bilby91 Do you know if this would also work for mux and ControlPath or will it require more work to get it working?

@bilby91
Copy link
Author

bilby91 commented Dec 5, 2023

@arixmkii To be honest I don't know.

Any chance you have local development environment where you could test it ? If not I might be able to help over the weekend.

@arixmkii
Copy link

arixmkii commented Dec 6, 2023

@bilby91 I rebased your changes to 9.4.0.0 and built locally. Unfortunately it didn't work, but the error is suspicious, I don't have logs at hand, but it was about socket naming. Might be an error at command parsing level. I will debug it at the end of this week to figure out more details.

@arixmkii
Copy link

arixmkii commented Jan 9, 2024

Answering my question. No Mux will not work as everything mux is now covered by no_ops stubs. I will try to investigate the possible subset of Mux, which could be achieved with AF_UNIX additions from this PR.

Copy link

@JojOatXGME JojOatXGME left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bilby91, I just noticed your PR. I am just a bit confused as it looks like it might contradict parts of what I have researched over the weekend. (See also PowerShell/Win32-OpenSSH#1024 (comment).)

I also looked over your PR and made a few comments. However, I was missing some context and could therefore not really understand all of it. (Also note that I am not affiliated with Microsoft.)

What I don't understand at your PR is what has agent forwarding to do with whether SSH on Windows uses unix domain sockets instead of named pipes? This seems like two unrelated topics for me. I assume the changes in session.c apply to sshd? Is it possible that you could also have used named pipes here and then skipped all the other changes regarding unix domain sockets?

I've also tested that WSL can successfully use the SSH Agent sock as long as the SSH_AUTH_SOCK is defined in the bash session.

Are you talking about WSL1? Because I thought that WSL2 does not support interoperability with unix domain sockets (yet).

Comment on lines +333 to +336
if(wcstombs(pipeid, AGENT_PIPE_ID, len + 1) != (size_t) -1 && strcmp(addr->sun_path, pipeid) == 0)
return w32_fileio_socket(SOCK_STREAM, 0);
else
return w32_unix_socket(SOCK_STREAM, 0);
Copy link

@JojOatXGME JojOatXGME Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the code correctly, you could technically just call w32_socket(AF_INET, SOCK_STREAM, 0) and w32_socket(AF_UNIX, SOCK_STREAM, 0), instead of introducing two new methods? Although I understand that AF_INET would be misleading.

Besides, instead of checking whether addr->sun_path equals AGENT_PIPE_ID, I guess you could also just check the prefix. If the path starts with \\.\pipe\, I think you can assume that it is a named pipe. This way, the solution stays compatible with other named pipes as well. For example, if a user uses Pageant (PuTTY authentication agent), they might also use named pipes, but the pipe may be named //./pipe/pageant.<username>.<random_sequence>. (This means you should probably also accept both, slash (/) and backslash (\) in the prefix.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I was thinking of behaving this way just for this implementation but I think the prefix makes sense in order to support other implementations.

authfd.c Outdated
Comment on lines 100 to 106
if ((sock = socket(AF_UNIX, SOCK_STREAM, 0)) == -1)
#ifdef HAVE_AFUNIX_H
sock = w32_afunix_socket(&sunaddr);
#else
sock = socket(AF_UNIX, SOCK_STREAM, 0);
#endif

if (sock == -1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this #ifdef condition? Shouldn't you be able to always call w32_afunix_socket as this method already handles the #ifdef condition internally? Same question applies to misc.c and mux.c.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. Will look into simplifying! Thanks

Comment on lines +125 to +129
#ifdef HAVE_AFUNIX_H
context->accept_socket = socket(addr.ss_family, SOCK_STREAM, IPPROTO_IP);
#else
context->accept_socket = socket(addr.ss_family, SOCK_STREAM, IPPROTO_TCP);
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this #ifdef condition necessary? If IPPROTO_TCP does not work for some reason, would it be possible to always use IPPROTO_IP (or just 0)? man socket(2) suggests that specifying 0 (which is the value of IPPROTO_IP) will use the default, which again is TCP.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need to try this out. I don't recall if IPPROTO_TCP was not working.

@bilby91
Copy link
Author

bilby91 commented Feb 25, 2024

@bilby91, I just noticed your PR. I am just a bit confused as it looks like it might contradict parts of what I have researched over the weekend. (See also PowerShell/Win32-OpenSSH#1024 (comment).)

I also looked over your PR and made a few comments. However, I was missing some context and could therefore not really understand all of it. (Also note that I am not affiliated with Microsoft.)

What I don't understand at your PR is what has agent forwarding to do with whether SSH on Windows uses unix domain sockets instead of named pipes? This seems like two unrelated topics for me. I assume the changes in session.c apply to sshd? Is it possible that you could also have used named pipes here and then skipped all the other changes regarding unix domain sockets?

I've also tested that WSL can successfully use the SSH Agent sock as long as the SSH_AUTH_SOCK is defined in the bash session.

Are you talking about WSL1? Because I thought that WSL2 does not support interoperability with unix domain sockets (yet).

@JojOatXGME Thanks for taking a look at the code!

It has been almost an year since I worked on this change but I'm pretty positive I tried using named pipes instead of the AF_UNIX sockets and it was not working. Could have been lack of understanding on my side so it's highly likely than an implementation based on them is possible.

Regarding WSL1 vs WSL2, I think I was using WSL 2 in the machine I used for testing. Unfortunately I don't have that machine any more but I can look into double checking that in a different one potentially.

Are you able to test yourself by any chance ?

@JojOatXGME
Copy link

JojOatXGME commented Feb 26, 2024

Are you able to test yourself by any chance ?

I have to see. I haven't worked on the project before. I actually did not work on any C-project for quite some time. I would have to set up some dev environment to build the project. Anyway, I will try to play with it within the next two weeks.

PS: Assuming we could use named pipes in session.c for the agent forwarding, we might be able to split this PR into three smaller changes: 1. agent forwarding, 2. client side support for native unix domain socket, 3. option to use native unix domain sockets for ssh-agent and sshd (agent forwarding). Maybe splitting the topic would simplify the security review? The biggest security risk or uncertainty comes from 3. in my opinion. If I find time to play with it, this is what I will properly try to check.

mux.c Outdated
if ((sock = socket(PF_UNIX, SOCK_STREAM, 0)) == -1)
#ifdef HAVE_AFUNIX_H
sock = w32_afunix_socket(&addr);
#elif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this elif intentional? It doesn't even compile.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thedarkcolour That's wrong, it should be an #else. I'll fix it.

@RailWar
Copy link

RailWar commented Jan 9, 2025

I changed this code in channels.c connect_next()

#ifdef HAVE_AFUNIX_H
if ((sock = w32_afunix_socket(&sunaddr)) == -1) {
#else
if ((sock = socket(cctx->ai->ai_family, cctx->ai->ai_socktype,
cctx->ai->ai_protocol)) == -1) {
#endif

and now I can forward tcp port to unix socket.

@vladertel
Copy link

@tgauth Any plans on merging this?

@wabscale
Copy link

It would be great if this was merged.

@bilby91
Copy link
Author

bilby91 commented Feb 25, 2025

Not sure where this stands as per @tgauth, but, I applied some of the suggestions from @JojOatXGME and fixed the #elif bug.

@uhx
Copy link

uhx commented May 29, 2025

It has already been two years. Is there any chance of merging that?
@tgauth

@JojOatXGME
Copy link

FYI, I also took a closer look at the code after my initial review. I tried to split up the changes into three separate topics, as described at #674 (comment). Anyway, while doing that, I got the impression that the PR is mixing up two different abstraction layers. Specifically, some changes are made inside the Socket compatibility layer, and therefore affect all sockets, while other changes have been made in the usage of the compatibility layer and only affect a few specific use cases.

It might also be relevant that the project probably tries to minimize potential merge conflicts with upstream OpenSSH in the future. I could imagine that changes outside the compatibility layer might therefore be significantly less likely to be integrated. But that is just a guess.

Unfortunately, I didn't have the time and energy to come up with an alternative solution based on my feedback. I am also still not affiliated with Microsoft. I just thought I share my perspective. Since we haven't received much feedback from MS, I don't know whether they have similar concerns, completely different concerns, or whether they just haven't looked at the PR in detail.

@uhx
Copy link

uhx commented May 29, 2025

@JojOatXGME yeah, I clearly understand this stance and in sounds reasonable, but it would be helpful to hear the official position from maintainers, since there is no rule covering changes to non-compatibility layer. That clarification would save contributors time. I'm willing to help somehow, but this MR is a bit intimidating -- I have no experience with this kind of development, and it will take me a lot time.

I also hope this MR somehow will bring us closer to resolving this issue PowerShell/Win32-OpenSSH#1328 (i might be wrong)

@anonhostpi
Copy link

anonhostpi commented Jun 11, 2025

@uhx @vladertel @wabscale

From an earlier comment:

This is delayed due to focusing on more simple PRs into the 9.4 release as well as an in depth security review of the behavior change. Because this is a new feature, not a bug fix, it requires further scrutiny.

For context:

Security conversations on Microsoft products do not occur over night and in a lot of cases extend out for months (if not years).

This happens, because Microsoft is an enterprise organization and needs a full and complete understanding of the implications of each potential security vulnerability before shipping it out. That is especially applicable to something like OpenSSH, which is a crucial piece of critical infrastructure.

The thing I would like to highlight the most is that many of these conversations are government-mandated by the countries Microsoft offers their products in. This is to ensure that any features are law- and security-compliant before being shipped out in the applicable nations.

That means there is very likely a very thick layer of red tape involved with a feature like this, even if the feature is simplistic. That also means the process is likely very slow as any process involving government regulation is painfully slow by default.

EDIT: not a Microsoft employee, but I am a government IT worker and am familiar with these sorts of conversations.


sock = socket(PF_UNIX, SOCK_STREAM, 0);
if (sock == -1) {
if ((sock = w32_afunix_socket(&sunaddr)) == -1) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we replace this socket call as well:

if ((sock = socket(AF_UNIX, SOCK_STREAM, 0)) == -1)
?

In my use case I build ssh-agent from the upstream OpenSSH's ssh-agent.c, not Microsoft's custom implementation under contrib/win32/win32compat/ssh-agent/agent.c. Thus to enable ssh.exe to communicate via socket to ssh-agent.exe I need to replace the socket call in authfd.c with your w32_afunix_socket.

Microsoft's ssh-agent uses Windows pipes for IPC so the socket call in authfd.c is unused.

# These are the known failed testcases.
# transfer.sh, rekey.sh tests fail on CygWin v3.4.0, but succeeds with v3.3.6
$known_failed_testcases = @("agent.sh", "key-options.sh", "forward-control.sh", "integrity.sh", "krl.sh", "cert-hostkey.sh", "cert-userkey.sh", "percent.sh", "transfer.sh", "rekey.sh")
$known_failed_testcases = @("key-options.sh", "forward-control.sh", "integrity.sh", "krl.sh", "cert-hostkey.sh", "cert-userkey.sh", "percent.sh", "transfer.sh", "rekey.sh")
Copy link

@Onur-TURAN Onur-TURAN Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't use agent.sh? regress folder have agent.sh

#define USE_PIPES 1

/* Define name for the ssh-agent Windows named pipe */
#define AGENT_PIPE_ID L"\\\\.\\pipe\\openssh-ssh-agent"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to consider this issue specifically, and I have the following reservations in mind. maybe This command has probability an LFL vulnerability. For example, the X user has 600 chmod permissions. But the folder path '\pipe\openssh-ssh-agent' may have directory path access. If you must define another folder path, use a command line shortcut or pipeline. This will only affect the permission of one file and will not affect any other folder paths.

tmp_addr = (SOCKADDR*)&tmp_addr4;
tmp_addr_len = sizeof(tmp_addr4);
#ifdef HAVE_AFUNIX_H
} else if (name->sa_family == AF_UNIX) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this project runs on a Unix operating system without a hybrid kernel, the connection type requirements should avoid pointer access. On Unix systems, memory assets cannot be controlled while the process is running, which could lead to potential vulnerabilities.

/* Use Windows temporal directory instead of unix `/tmp` folder */
static char tmp_file_path[MAX_PATH];
DWORD tmp_path_len = GetTempPath(MAX_PATH, tmp_file_path);
if (tmp_path_len > MAX_PATH || tmp_file_path == 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp_file_path should be tmp_file_len here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp_path_len

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.